Development and validation of an individual-based state-transition model for the prediction of frailty and frailty-related events

Frailty is a biological syndrome that is associated with increased risks of morbidity and mortality. To assess the value of interventions to prevent or manage frailty, all important impacts on costs and outcomes should be estimated. The aim of this study is to describe the development and validation of an individual-based state transition model that predicts the incidence and progression of frailty and frailty-related events over the remaining lifetime of older Australians. An individual-based state transition simulation model comprising integrated sub models that represent the occurrence of seven events (mortality, hip fracture, falls, admission to hospital, delirium, physical disability, and transitioning to residential care) was developed. The initial parameterisation used data from the Survey of Health, Ageing, and Retirement in Europe (SHARE). The model was then calibrated for an Australian population using data from the Household, Income and Labour Dynamics in Australia (HILDA) Survey. The simulation model established internal validity with respect to predicting outcomes at 24 months for the SHARE population. Calibration was required to predict longer terms outcomes at 48 months in the SHARE and HILDA data. Using probabilistic calibration methods, over 1,000 sampled sets of input parameter met the convergence criteria across six external calibration targets. The developed model provides a tool for predicting frailty and frailty-related events in a representative community dwelling Australian population aged over 65 years and provides the basis for economic evaluation of frailty-focussed interventions. Calibration to outcomes observed over an extended time horizon would improve model validity.


Introduction
Frailty is a common progressive condition among older people (aged 65 or more) with a significant impact on demand for healthcare services. With an ageing population, frailty is becoming a worldwide health concern [1][2][3][4]. There are two broad approaches used to measure frailty status: a Frailty Phenotype (FP) and a Frailty Index (FI) [5]. The FI approach measures frailty as a cumulation of deficits across a large range of tests, whereas Frailty Phenotype conceptualises frailty as physiologic syndrome which is present when three or more of five deficits are present [6]. The five deficits are Weight loss, Exhaustion, Low physical activity, Grip strength and Slowness. A score of 0 is classified non-frail, a score of 1 or 2 is classified as pre-frail and a score of 3 or greater is considered frail. Pre-frailty also increases risk of adverse outcomes [6]. Globally, it has been estimated that between 4% and 17% of older people are frail [6]. A recent geospatial modelling study in Australia found that in 2016 over 11% (415,000) of the 3.7 million older people were frail with an additional 46% (1.7 million) at increased risk of becoming frail (pre-frail), representing an increase of 20.5% in the combined frail and pre-frail populations in Australia since 2011 [7]. Frailty confers an increased lifetime risk of important health outcomes/events such as hip fracture, physical disability, and delirium resulting in an increased use of healthcare resources including hospital admissions, doctor consultations, and pharmaceuticals [6,[8][9][10][11]. A recent systematic review has shown that frailty is a strong predictor of nursing home placement [12]. These consequences are significant from the perspective of frail people, their families/carers, health and aged care systems. There is evidence that frailty is potentially manageable and that timely interventions can reduce the risk of adverse events, delay frailty progression and improve frailty status [13][14][15][16]. A range of interventions, such as physiotherapy/exercise have been shown to be effective in managing frailty in older people [15,[17][18][19]. However, the associated costs of these interventions must be assessed to generate cost-effectiveness evidence as one of the key inputs required to inform public funding decisions in countries such as Australia and the UK. A few withintrial economic evaluations have failed to establish cost-effectiveness of frailty interventions [20,21], however, the time horizon of clinical trials may not be sufficient to capture all important costs and health outcomes of interventions.
The long-term analysis of costs and health outcomes associated with healthcare interventions to assess their value is achieved using decision analytic models. While there are models for predicting frailty and particular events such as mortality [22], using alternate measure for frailty [23], to our knowledge, to date there is only one decision analytic model that can be used to evaluate both costs and outcomes of frailty interventions beyond the duration of clinical studies [24]. Karnon et al. developed a cohort-based state transition (Markov) model to predict costs and quality-adjusted life years (QALYs) over the lifetime of a frail population. The model extrapolated the findings from a clinical trial of a physiotherapy-based intervention from a one-year to a lifetime horizon. Over a 12-month follow-up period, the clinical trial reported a statistically significant reduction in frailty prevalence in the intervention group, a difference in mean costs per patient of $2,145 between the control and intervention groups, but only a trivial difference in quality of life [21]. Using a Markov model to extrapolate costs and outcomes over a lifetime horizon, Karnon et al. estimated an incremental cost per additional QALY of $8,179.
The existing cohort-based Markov model for frailty captured only a limited range of health states, combining frailty-related mediators and outcomes; falls, fractures, transition to residential care and depression [24]. Subsequent research to define an optimal model structure for a frailty cost-effectiveness model found a more complex model structure is required to more accurately capture important differences in costs and outcomes [25]. Given the complexity of the developed model structure and the number of health states required, it is proposed that an individual-based model is preferred [26]. Such models are the preferred modelling choice when there is baseline heterogeneity in the disease being modelled, there are continuous disease markers, time-varying event markers, and prior events have an influence on subsequent event rates [27,28].
In this paper, we report on the development and validation of an individual-based state transition model that predicts the progression of frailty and frailty-related events over the remaining lifetime of older Australians. The model forms the basis for a cost-effectiveness model for the economic evaluation of interventions to prevent or better manage frailty in community dwelling older people in Australia.

Model overview
A comprehensive model conceptualisation process was undertaken by Afzali et al. [25] to propose a model structure for the cost-effectiveness analysis of frailty interventions. This process was informed by two steps, which involved a critical analysis of frailty-related clinical and economic literature followed by a Delphi study involving clinical experts to achieve consensus on the final proposed model structure. The study identified health states and events for which frailty is a strong independent risk factor as well as patient attributes which influence disease progression. The seven health states/events identified in the final round were hip fracture, falls, transition to residential care, hospital admission, physical disability, delirium, and death. The attributes identified were age, gender, education, frailty status, diabetes, depression, polypharmacy and stroke.
The flow of individuals through the simulation model is demonstrated in Fig 1. Individuals enter the model with a set of patient attributes, including a frailty phenotype score, and previous event history. These characteristics influence an individual's predicted risk of outcomes in each monthly cycle. Individuals may experience several events or none in any individual cycle. Patients' individual characteristics, including frailty status, and event history are then updated prior to beginning the next cycle. A monthly cycle was chosen since it was considered sufficient to provide a reasonable time horizon in relation to disease progression.

Model construction
The model was constructed using TreeAge software [29]. The model draws participants from a table of individuals with baseline characteristics. These characteristics are then used to predict movement through a combination of event pathways using risk equations at each branch. The model uses Monte Carlo simulations at each event node to reflect stochastic variability in patient trajectories.
The Monte Carlo simulation approach uses tracker variables to update the participants characteristics over time. Events are recorded at each branch by trackers which allow these events to be collected over the course of the model. Updates to patient characteristics and event history are also enabled using trackers and all of these are updated at the end of each cycle.

Model inputs
Data sources. Survey data was used because this is the only data available which captures all the events included in the model structure and allows the creation of a measure of frailty phenotype, a measure which has been used extensively in the frailty literature [24,[30][31][32][33]. The review undertaken by Afzali et al. [25] identified The Survey of Health, Ageing and Retirement in Europe (SHARE) as a relevant database to populate the model [34]. The SHARE database includes information on relevant health states, sociodemographic data, and relevant attributes [35]. SHARE comprises longitudinal cross-national data from approximately 140,000 individuals aged 50 years and older covering 28 European countries and Israel. To date, SHARE has collected and released eight panel waves (2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020)(2021), with participant interviews conducted approximately every two years. To inform the transition probabilities for events and stroke, polypharmacy and frailty status, we used data from waves 4 to 6 (2013 to 2017) [36][37][38][39][40][41][42][43]. This comprised a population of around 26,000 initial participants.
To construct our risk model for residential care, we also used data available from the Australian Institute of Health and Welfare (AIHW) GEN Aged Care Data [44,45] and the Australian Bureau of Statistics (ABS) [46] to estimate the proportions of different aged cohorts moving into residential care.
To conduct the external validation of the model we used waves 13 and 17 (years 2013 and 2017) of the Household, Income and Labour Dynamics in Australia (HILDA) Survey [47,48]. In the simulation model, the experience of a new stroke was assumed to impact the likelihood of future events for the remainder of participants' time horizon (i.e., the participant's attribute for stroke is permanently set to a value of 1). After experiencing any of the other events represented in the model, the corresponding attributes are set to a value of 1 for 24 cycles, after which, should there be no new event, the value reverts to zero. For participants entering the model (in cycle 0) with an event attribute set to a value of 1 (other than for stroke), the attribute reverts to a value of 0 after 12 months, unless a new event occurs in that time. A participant's frailty status is updated at the commencement of each cycle. Participants leave the model at age 100 years or when they die. https://doi.org/10.1371/journal.pone.0290567.g001 HILDA is a household-based panel study, which collects information on economic and personal well-being. Started in 2001, HILDA has released 20 annual waves to date. A health-specific questionnaire module has been run every 4 years beginning from wave 9. Using the HILDA dataset, we were provided with health data for around 2,700 participants who had completed both the wave 13 and the wave 17 surveys for the Health Module [49]. These waves were chosen to their proximity in time frame to the SHARE data waves which had been used to populate the model.
A detailed description of all the variables used to populate the model and to construct the four-factor modified Frailty phenotype in the SHARE and HILDA data are provided in the Appendices.
Data analysis. In our model, we used a four-level frailty status variable, representing 0, 1, 2 and 3 or more frailty phenotypes. Individuals with three to five phenotypes were combined into a single category due to the small numbers of observations in the SHARE data.
Derivation of risk equations: Transition probabilities for all events except death and residential care were estimated using multivariable logistic equations in which the dependent variables were the wave 5 recorded values for the event and the independent variables were selected from wave 4 characteristics and events, including the dependent variable event for wave 4. We used a stepwise approach through backwards elimination, beginning with a model which includes all predictors and previous events. Variables were retained in the models if the p value associated with the estimated coefficient was < 0.05. Probability of death was similarly estimated from a logistic regression model in which the dependent variable was death within 12 months.
Due to the very low experience of residential care in the SHARE data, we used a logistic model predicting the probability of transitioning to residential care as a function of age and frailty phenotype only using the SHARE data in wave 4 and 5 [24]. Odds ratios representing the likelihood of pre-frail and frail individuals transitioning to residential care compared to non-frail individuals were converted to relative risks, which were applied to annual probabilities of non-frail individuals transitioning to residential care for 5-year age groups. The probabilities for non-frail individuals transitioning were calibrated such that the aggregate numbers of non-frail, pre-frail and frail individuals transitioning to residential care equalled the observed numbers transitioning to residential care in each age group.
Frailty status was updated each cycle based on a multinomial logistic regression model, which predicted the probability of moving from one frailty phenotype level to another based on individuals' current frailty phenotype level, attributes and prior events. In the frailty status prediction model, age was entered as a factor variable in 5 year age bands, from ages 65 to 69 years, with the last category being age 85 years and above. Age was entered as a continuous variable for all other sub-models.
All analyses were performed using Stata v14. All the individual model specifications are provided in the Appendices.

Model performance evaluation
The face validity of the model structure and the choice of data sources was confirmed by experts through the process described in Afzali et al. [25]. Verification of the model involved a process of checking all formulae, model debugging and checking that the outputs were logical. Internal validation was performed by comparing the overlap in the 95% and 99% confidence intervals around the observed and predicted means for event incidence for individuals at 24 months and at 48 months with the observed mean and confidence intervals for the incidence of events in the SHARE wave 5 and wave 6 data for the wave 4 participants. We also checked whether the estimated mean incidence fell within 10% of the mean observed value for each event. We compared the model predictions for the rates of polypharmacy and the mean frailty phenotype score at 24 months and at 48 months using the same criteria. Due to growing evidence of gender differences in the progression of frailty and frailty characteristics [30,[50][51][52], we compared model predictions for individuals 65-75 years of age and for ages over 75 years separately for men and women.
We assessed the external validity of the model by using waves 13 and 17 of the HILDA data [47]. Individual participant baseline values for falls, hip fracture and delirium, for which there were no equivalent values in the HILDA data, were imputed using logistic associative models in the SHARE wave 4 data which predicted the probability of an individual reporting a history of these characteristics based on other variables in the wave 4 data. Using a random number generator to add stochastic variability, these predicted probabilities were then used to assign baseline values for these variables for each individual in the HILDA data.
We then compared model predicted outcomes to observed outcomes (excluding falls, hip fracture and delirium) in the HILDA data over 48 months for participants who completed both surveys 13 and 17. Pre-specified convergence criteria for the external validation targets required prediction of the events and attributes at 48 months to within the 95% confidence interval of the observed values in each subgroup for age and gender.
Model calibration. As our model did not meet all of the pre-specified convergence criteria for internal and external validation, model calibration was undertaken. To calibrate the simulation model to the HILDA data, we modified a method used by Kim and Thompson (2010), which involved the application of multipliers to selected coefficients in the predictive equations [53]. Due to the number of models and the number of variables in our models, we modified Kim and Thompson's approach to apply the multipliers directly to the transition probabilities, e.g., [multiplier for probability of death x model predicted probability of Death].
To estimate multipliers for each prediction model, we first used multinomial functions to randomly sample 2,000 sets of coefficient values across all prediction sub models using the coefficient and variance-covariance matrices for the originally specified model equations. The multipliers were adjusted incrementally until the mean outputs across the 2,000 model runs using the sampled sets of coefficient values met the convergence criteria for all external validation targets.
We then ran 2,000 model runs using the calibrated multipliers to identify sets of coefficient values that individually met the convergence criteria. The final model was then run again using the convergent parameter sets.

Results (Model performance)
The baseline characteristics of the participants in the SHARE and HILDA surveys are presented in Table 1. The HILDA population were slightly more male, slightly more educated, had a slightly lower rate of diabetes, slightly higher rates of depression, and a lower baseline mean level of frailty phenotype. The biggest difference between the populations were the rates for Polypharmacy. The Australian population had considerably higher rates of participants reporting that they were taking 5 or more drugs a week over the last 12 months. Table 2 shows frailty prevalence and mean frailty by age cohorts at baseline across SHARE and HILDA datasets. Table 2 also reports results from an Australian study of frailty as a comparator [33]. In general, frailty rates were lower in the HILDA data than in SHARE or Thompson. While there are around the same proportions of people classified as frail with a frailty phenotype of 3 or more in all three data sets, the HILDA data contains more non-frail and fewer people with 1 or 2 frailty phenotypes and mean frailty is lower in the HILDA population than in the SHARE population at all age groups.

Sub-models
The area under the receiver operating characteristics curve (ROC), also known as c-statistics [54], provides an aggregate measure of classification performance across classification thresholds.  Tables 3 and 4 show the model predictions of events and updated attributes at 24 months and at 48 months for the wave 4 SHARE population compared to the observed values for that population in wave 5 and wave 6 for males and females respectively. There was overlap in the 95% confidence intervals for the predicted and observed means for all events and updates, except stroke in males aged 65-75 and hospital admissions and mean frailty for females aged over 75. There was overlap in the 99% confidence intervals for these events, but the predicted mean did not lie within 10% of the observed mean. At 48 months, however, the model predictions were variable across groups and events. In males 65-75 years, there was overlap in the 95% confidence intervals between the observed and the predicted mean for all events, except falls and stroke, although there was overlap in the 99% confidence intervals for predicted and observed mean for falls. The predictions for death, falls, hip fracture and stroke in males 65-75 years, were more than 10% above the observed mean.

Internal validation
In females 65-75 years, there was only overlap in the 95% confidence intervals between the observed and the predicted mean for delirium, disability, falls and hip fracture, although there was overlap in the 99% confidence intervals for hospital admission and mean predicted frailty fell. Predictions fell more than 10% above observed values for death, hospital admissions and stroke, but below 10% of the observed mean for delirium and polypharmacy at 48 months in this group.
In males over 75 years, there was only overlap in the 95% confidence intervals between observed and predicted incidence for delirium, hip fracture, polypharmacy and mean frailty, although there was overlap in 99% confidence intervals for disability and falls. All predictions for events and attributes, except for delirium and polypharmacy were higher than 10% above the observed mean.
For females over 75 years, there was only overlap in confidence intervals between observed and predicted incidence for delirium and hip fracture and polypharmacy, although there was overlap between the 99% confidence intervals for disability and stroke. All predictions, except for delirium and polypharmacy exceeded 10% above the observed mean in this group.

External validation
The external validation results for the model using the HILDA data are presented in Table 5, which shows the calibrated results using 1,271 convergent sets of sampled model input parameters. A table of the calibration multipliers applied to the transition probabilities is provided in the appendix. Estimates for the HILDA population from the calibrated model fall within the 95% confidence interval for the observed mean results at 48 months. Table 6 shows the predicted prevalence of each frailty phenotype category and the predicted mean frailty phenotype by age cohort compared to their observed values for the SHARE and the HILDA data. The model predictions were close to observed values for the prevalence of each frailty phenotype category and for mean frailty at 48 months in the SHARE data. The calibrated model predictions were close to observed values for the prevalence of each frailty phenotype category and for mean frailty at 48 months in the HILDA data, except for mean frailty value predicted for males in the age cohort 65-69 years. This was due to the small number of available observations for this age cohort as 65 years is the lowest age of entry for participants in the model, which may be creating the unexpectedly low numbers of males with zero frailty phenotypes in this age category in the observed data.

Discussion
This paper reports on the development, validation and calibration of an individual-based state transition model for the prediction of frailty and seven frailty-related events for a cohort of older Australians. Using a model structure previously validated by experts [25], the parameterisation of our simulation model was informed by the experience of a large heterogenous European population (SHARE) with external validation and calibration to a large Australian cohort (HILDA). A strength of our model is that it can estimate outcomes for highly heterogenous patient cohorts using a recognised measure of frailty. Our use of a four factor Frailty phenotype score also makes the model sensitive to transitions between frailty states to better capture intervention effects. The individual-based state transition model is an improvement on the previous Markov model [24] as it can more accurately capture the changing interdependencies between events and individual characteristics over time, which better represents the progression of frailty and its outcomes.
Internal validation showed the model predicted frailty-related events by age and gender at 24 months in the SHARE population with reasonable accuracy but was less accurate at predicting events at 48 months. The model required considerable calibration to accurately predict outcomes and characteristics by age and gender at 48 months for a different population in the external validation, which may be partly explained by differences in the data sources. The HILDA population was less frail at baseline and more highly educated than the SHARE population.
The two data sources used in the model construction share the difficulties associated with self-report, particularly in capturing events over time, with both SHARE and HILDA typically recording binary indicators of characteristics and events to indicate occurrence over a range of specified time frames. These differences are also likely exacerbated by differences in the data collected, particularly in the variables underpinning the construction of the frailty phenotype. The variable which showed the biggest difference in the two groups was the measure for exhaustion. In the SHARE data, the question informing this value involved a comparison of current and previous experience, e.g., "have you had any energy to do the things you used to do?", whereas for the HILDA data this measure was reliant on a much more general question regarding current levels of energy without reference to previous levels of energy. Other differences include the measure of grip strength, which is an objective measure in the SHARE data, whereas in the HILDA data it is a self-report measure. Conversely the measure of weight loss is based on BMI is the HILDA data, whereas the SHARE data measures this phenotype with selfreported changes in appetite. Slight differences also occur in the variables used to define characteristics and events in the model. The measure of stroke in the HILDA data includes other serious circulatory conditions, unlike the SHARE data which records stroke specifically. Difference in polypharmacy were even more marked with the HILDA recording the number of prescription medicines taken on a regular basis, compared to SHARE reporting the number of drugs taken in a week for a defined set of conditions.
Another explanation for the model's poor performance over 48 months duration may be the use of logistic regression models, which assume time-constant monthly prediction of events, which means misrepresentations in the models cumulates the effects over 48 months. The estimation of transition probabilities for events in the model was complicated by the lack of specific time of events. The multinomial logistic model used to predict changes in frailty represented changes in frailty states across all levels of phenotype, for which low numbers of observed transitions between some frailty levels may have reduced the accuracy of the model. While other methods were attempted, such as zero-inflated beta models, these were not found to produce better prediction and made more restrictive assumptions regarding pathways through models. The lack of more specific time to event data and the limited available number of waves at the time of model construction meant that we were unable to fit reliable risk equations using survival curves approaches which would have avoided the need for the assumption of time constancy in the transition probabilities. However, the constructed model is sufficiently flexible to allow for updates to the underlying sub models that underpin the individualbased state-transition model. The application of the regression models over monthly cycles in the simulation model cumulates the effects of any inaccuracies in the models.
With an aging population, Frailty prevalence is likely to increase with detrimental effects on both quality of life for affected people and health service costs. Published results of frailty interventions tend to only consider outcomes during trial duration, but there are likely to be longer-term costs and effects that should be estimated when evaluating frailty interventions. The calibrated model accurately predicts frailty and frailty-related events and attributes by age and gender over 48 months in a community-dwelling Australian population aged 65 and over, with a range of underlying characteristics and event histories. The next steps include the assignment costs and outcomes (e.g., utility values) to the predicted events to enable the model to be used for the economic evaluation of frailty interventions. A longer term goal will be to update the model as more longitudinal data becomes available, such that the evaluation tool will capture important longer term effects of interventions designed to delay or prevent the onset of frailty.